Overview

Dataset statistics

Number of variables13
Number of observations2969
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory301.7 KiB
Average record size in memory104.0 B

Variable types

Numeric13

Alerts

gross_revenue is highly correlated with qtde_invoices and 2 other fieldsHigh correlation
recency_days is highly correlated with qtde_invoicesHigh correlation
qtde_invoices is highly correlated with gross_revenue and 4 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 3 other fieldsHigh correlation
avg_ticket is highly correlated with avg_unique_basket_sizeHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
avg_basket_size is highly correlated with qtde_invoices and 1 other fieldsHigh correlation
avg_unique_basket_size is highly correlated with qtde_products and 2 other fieldsHigh correlation
gross_revenue is highly correlated with qtde_invoices and 1 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 1 other fieldsHigh correlation
qtde_products is highly correlated with qtde_invoicesHigh correlation
avg_ticket is highly correlated with qtde_retrunsHigh correlation
qtde_retruns is highly correlated with avg_ticketHigh correlation
avg_basket_size is highly correlated with avg_unique_basket_sizeHigh correlation
avg_unique_basket_size is highly correlated with avg_basket_sizeHigh correlation
gross_revenue is highly correlated with qtde_invoices and 2 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 3 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_recency_days is highly correlated with frequencyHigh correlation
frequency is highly correlated with avg_recency_daysHigh correlation
avg_basket_size is highly correlated with qtde_invoicesHigh correlation
gross_revenue is highly correlated with qtde_invoices and 4 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 4 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_ticket is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_retruns is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_unique_basket_sizeHigh correlation
avg_unique_basket_size is highly correlated with avg_basket_sizeHigh correlation
avg_ticket is highly skewed (γ1 = 53.44422362) Skewed
frequency is highly skewed (γ1 = 24.88049136) Skewed
qtde_retruns is highly skewed (γ1 = 51.79774426) Skewed
df_index has unique values Unique
customer_id has unique values Unique
recency_days has 34 (1.1%) zeros Zeros
qtde_retruns has 1481 (49.9%) zeros Zeros

Reproduction

Analysis started2021-11-20 17:04:11.988771
Analysis finished2021-11-20 17:04:55.140669
Duration43.15 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2317.292354
Minimum0
Maximum5715
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:55.770083image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile185.4
Q1929
median2120
Q33537
95-th percentile5035.2
Maximum5715
Range5715
Interquartile range (IQR)2608

Descriptive statistics

Standard deviation1554.944589
Coefficient of variation (CV)0.6710178739
Kurtosis-1.010787014
Mean2317.292354
Median Absolute Deviation (MAD)1271
Skewness0.342284058
Sum6880041
Variance2417852.674
MonotonicityStrictly increasing
2021-11-20T14:04:56.030247image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
30111
 
< 0.1%
29961
 
< 0.1%
29991
 
< 0.1%
30001
 
< 0.1%
30011
 
< 0.1%
30021
 
< 0.1%
30051
 
< 0.1%
30071
 
< 0.1%
30081
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57151
< 0.1%
56961
< 0.1%
56861
< 0.1%
56801
< 0.1%
56591
< 0.1%
56551
< 0.1%
56491
< 0.1%
56381
< 0.1%
56371
< 0.1%
56271
< 0.1%

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15270.77299
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:56.378174image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12619.4
Q113799
median15221
Q316768
95-th percentile17964.6
Maximum18287
Range5940
Interquartile range (IQR)2969

Descriptive statistics

Standard deviation1718.990292
Coefficient of variation (CV)0.1125673398
Kurtosis-1.206094692
Mean15270.77299
Median Absolute Deviation (MAD)1488
Skewness0.03160785866
Sum45338925
Variance2954927.624
MonotonicityNot monotonic
2021-11-20T14:04:56.727211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
175881
 
< 0.1%
149051
 
< 0.1%
161031
 
< 0.1%
146261
 
< 0.1%
148681
 
< 0.1%
182461
 
< 0.1%
171151
 
< 0.1%
166111
 
< 0.1%
159121
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123521
< 0.1%
123561
< 0.1%
123581
< 0.1%
123591
< 0.1%
123601
< 0.1%
123621
< 0.1%
123641
< 0.1%
123701
< 0.1%
ValueCountFrequency (%)
182871
< 0.1%
182831
< 0.1%
182821
< 0.1%
182771
< 0.1%
182761
< 0.1%
182741
< 0.1%
182731
< 0.1%
182721
< 0.1%
182701
< 0.1%
182691
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2954
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2749.321711
Minimum6.2
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:57.011146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum6.2
5-th percentile229.77
Q1570.96
median1086.92
Q32308.06
95-th percentile7219.68
Maximum279138.02
Range279131.82
Interquartile range (IQR)1737.1

Descriptive statistics

Standard deviation10580.62331
Coefficient of variation (CV)3.848448607
Kurtosis353.944724
Mean2749.321711
Median Absolute Deviation (MAD)672.16
Skewness16.77755612
Sum8162736.16
Variance111949589.6
MonotonicityNot monotonic
2021-11-20T14:04:57.244543image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178.962
 
0.1%
533.332
 
0.1%
889.932
 
0.1%
2053.022
 
0.1%
745.062
 
0.1%
379.652
 
0.1%
2092.322
 
0.1%
731.92
 
0.1%
1353.742
 
0.1%
3312
 
0.1%
Other values (2944)2949
99.3%
ValueCountFrequency (%)
6.21
< 0.1%
13.31
< 0.1%
151
< 0.1%
36.561
< 0.1%
451
< 0.1%
521
< 0.1%
52.21
< 0.1%
52.21
< 0.1%
62.431
< 0.1%
68.841
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
168472.51
< 0.1%
140450.721
< 0.1%
124564.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
72882.091
< 0.1%
66653.561
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct272
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.28763894
Minimum0
Maximum373
Zeros34
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:57.531219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q111
median31
Q381
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)70

Descriptive statistics

Standard deviation77.75677911
Coefficient of variation (CV)1.209513686
Kurtosis2.777962659
Mean64.28763894
Median Absolute Deviation (MAD)26
Skewness1.798379538
Sum190870
Variance6046.116697
MonotonicityNot monotonic
2021-11-20T14:04:57.733745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
199
 
3.3%
487
 
2.9%
285
 
2.9%
385
 
2.9%
876
 
2.6%
1067
 
2.3%
966
 
2.2%
766
 
2.2%
1764
 
2.2%
1655
 
1.9%
Other values (262)2219
74.7%
ValueCountFrequency (%)
034
 
1.1%
199
3.3%
285
2.9%
385
2.9%
487
2.9%
543
1.4%
766
2.2%
876
2.6%
966
2.2%
1067
2.3%
ValueCountFrequency (%)
3732
0.1%
3724
0.1%
3711
 
< 0.1%
3681
 
< 0.1%
3664
0.1%
3652
0.1%
3641
 
< 0.1%
3601
 
< 0.1%
3591
 
< 0.1%
3584
0.1%

qtde_invoices
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.723139104
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:58.061547image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum206
Range205
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.85653132
Coefficient of variation (CV)1.547495379
Kurtosis190.8344494
Mean5.723139104
Median Absolute Deviation (MAD)2
Skewness10.76680458
Sum16992
Variance78.43814702
MonotonicityNot monotonic
2021-11-20T14:04:58.242450image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
1190
 
6.4%
6173
 
5.8%
7138
 
4.6%
898
 
3.3%
969
 
2.3%
1055
 
1.9%
Other values (46)332
11.2%
ValueCountFrequency (%)
1190
 
6.4%
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
6173
 
5.8%
7138
 
4.6%
898
 
3.3%
969
 
2.3%
1055
 
1.9%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
0.1%
861
< 0.1%
721
< 0.1%
622
0.1%
601
< 0.1%
571
< 0.1%

qtde_items
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1671
Distinct (%)56.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1608.852476
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:58.410205image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile102.4
Q1296
median641
Q31401
95-th percentile4407.4
Maximum196844
Range196843
Interquartile range (IQR)1105

Descriptive statistics

Standard deviation5887.578045
Coefficient of variation (CV)3.659489067
Kurtosis465.998084
Mean1608.852476
Median Absolute Deviation (MAD)422
Skewness17.85859125
Sum4776683
Variance34663575.24
MonotonicityNot monotonic
2021-11-20T14:04:58.579095image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31011
 
0.4%
1509
 
0.3%
889
 
0.3%
2468
 
0.3%
2728
 
0.3%
848
 
0.3%
2608
 
0.3%
2888
 
0.3%
12007
 
0.2%
5167
 
0.2%
Other values (1661)2886
97.2%
ValueCountFrequency (%)
11
< 0.1%
22
0.1%
122
0.1%
161
< 0.1%
171
< 0.1%
181
< 0.1%
191
< 0.1%
201
< 0.1%
231
< 0.1%
251
< 0.1%
ValueCountFrequency (%)
1968441
< 0.1%
809971
< 0.1%
802631
< 0.1%
773731
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%
578851
< 0.1%

qtde_products
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct468
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.7241495
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:58.805053image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q129
median67
Q3135
95-th percentile382
Maximum7838
Range7837
Interquartile range (IQR)106

Descriptive statistics

Standard deviation269.8964081
Coefficient of variation (CV)2.199211884
Kurtosis354.8611303
Mean122.7241495
Median Absolute Deviation (MAD)44
Skewness15.70763473
Sum364368
Variance72844.07112
MonotonicityNot monotonic
2021-11-20T14:04:58.977026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2843
 
1.4%
2037
 
1.2%
3535
 
1.2%
2935
 
1.2%
1934
 
1.1%
1533
 
1.1%
1132
 
1.1%
2631
 
1.0%
2730
 
1.0%
2530
 
1.0%
Other values (458)2629
88.5%
ValueCountFrequency (%)
16
 
0.2%
214
0.5%
316
0.5%
417
0.6%
526
0.9%
629
1.0%
718
0.6%
819
0.6%
926
0.9%
1028
0.9%
ValueCountFrequency (%)
78381
< 0.1%
56731
< 0.1%
50951
< 0.1%
45801
< 0.1%
26981
< 0.1%
23791
< 0.1%
20601
< 0.1%
18181
< 0.1%
16731
< 0.1%
16371
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct2966
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.89776151
Minimum2.150588235
Maximum56157.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:59.151449image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.916661099
Q113.11933333
median17.95658654
Q324.98828571
95-th percentile90.497
Maximum56157.5
Range56155.34941
Interquartile range (IQR)11.86895238

Descriptive statistics

Standard deviation1036.934407
Coefficient of variation (CV)19.98033011
Kurtosis2890.707126
Mean51.89776151
Median Absolute Deviation (MAD)5.984842033
Skewness53.44422362
Sum154084.4539
Variance1075232.964
MonotonicityNot monotonic
2021-11-20T14:04:59.336400image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152
 
0.1%
4.1622
 
0.1%
14.478333332
 
0.1%
18.152222221
 
< 0.1%
13.927368421
 
< 0.1%
36.244117651
 
< 0.1%
29.784166671
 
< 0.1%
22.87926231
 
< 0.1%
20.511041671
 
< 0.1%
149.0251
 
< 0.1%
Other values (2956)2956
99.6%
ValueCountFrequency (%)
2.1505882351
< 0.1%
2.43251
< 0.1%
2.4623711341
< 0.1%
2.5112413791
< 0.1%
2.5153333331
< 0.1%
2.651
< 0.1%
2.6569318181
< 0.1%
2.7075982531
< 0.1%
2.7606215721
< 0.1%
2.7704641911
< 0.1%
ValueCountFrequency (%)
56157.51
< 0.1%
4453.431
< 0.1%
3202.921
< 0.1%
1687.21
< 0.1%
952.98751
< 0.1%
872.131
< 0.1%
841.02144931
< 0.1%
651.16833331
< 0.1%
6401
< 0.1%
624.41
< 0.1%

avg_recency_days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1258
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.34851138
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:59.547616image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q125.92307692
median48.28571429
Q385.33333333
95-th percentile201
Maximum366
Range365
Interquartile range (IQR)59.41025641

Descriptive statistics

Standard deviation63.54492876
Coefficient of variation (CV)0.9435238799
Kurtosis4.887109087
Mean67.34851138
Median Absolute Deviation (MAD)26.28571429
Skewness2.062770925
Sum199957.7303
Variance4037.957972
MonotonicityNot monotonic
2021-11-20T14:04:59.728425image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1425
 
0.8%
422
 
0.7%
7021
 
0.7%
720
 
0.7%
3519
 
0.6%
4918
 
0.6%
4617
 
0.6%
2117
 
0.6%
1117
 
0.6%
4216
 
0.5%
Other values (1248)2777
93.5%
ValueCountFrequency (%)
116
0.5%
1.51
 
< 0.1%
213
0.4%
2.51
 
< 0.1%
2.6013986011
 
< 0.1%
315
0.5%
3.3214285711
 
< 0.1%
3.3303571431
 
< 0.1%
3.52
 
0.1%
422
0.7%
ValueCountFrequency (%)
3661
 
< 0.1%
3651
 
< 0.1%
3631
 
< 0.1%
3621
 
< 0.1%
3572
0.1%
3561
 
< 0.1%
3552
0.1%
3521
 
< 0.1%
3512
0.1%
3503
0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1225
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1137973039
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:04:59.929033image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.008894164194
Q10.01633986928
median0.02588996764
Q30.04945054945
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.03311068017

Descriptive statistics

Standard deviation0.4081562524
Coefficient of variation (CV)3.586695275
Kurtosis989.3650758
Mean0.1137973039
Median Absolute Deviation (MAD)0.0121913375
Skewness24.88049136
Sum337.8641954
Variance0.1665915263
MonotonicityNot monotonic
2021-11-20T14:05:00.117337image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1198
 
6.7%
0.062518
 
0.6%
0.0277777777817
 
0.6%
0.0238095238116
 
0.5%
0.0909090909115
 
0.5%
0.0833333333315
 
0.5%
0.0344827586214
 
0.5%
0.0294117647114
 
0.5%
0.0357142857113
 
0.4%
0.0769230769213
 
0.4%
Other values (1215)2636
88.8%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
0.1%
0.005665722381
 
< 0.1%
0.0056818181822
0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
31
 
< 0.1%
26
 
0.2%
1.1428571431
 
< 0.1%
1198
6.7%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%
0.53
 
0.1%

qtde_retruns
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct214
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.1569552
Minimum0
Maximum80995
Zeros1481
Zeros (%)49.9%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:05:00.324988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q39
95-th percentile100.6
Maximum80995
Range80995
Interquartile range (IQR)9

Descriptive statistics

Standard deviation1512.496135
Coefficient of variation (CV)24.33349783
Kurtosis2765.52864
Mean62.1569552
Median Absolute Deviation (MAD)1
Skewness51.79774426
Sum184544
Variance2287644.557
MonotonicityNot monotonic
2021-11-20T14:05:00.525062image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2148
 
5.0%
3105
 
3.5%
489
 
3.0%
678
 
2.6%
561
 
2.1%
1251
 
1.7%
843
 
1.4%
743
 
1.4%
Other values (204)706
23.8%
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2148
 
5.0%
3105
 
3.5%
489
 
3.0%
561
 
2.1%
678
 
2.6%
743
 
1.4%
843
 
1.4%
941
 
1.4%
ValueCountFrequency (%)
809951
< 0.1%
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%
17761
< 0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct268
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.118282714
Minimum0.1764705882
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:05:00.725910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.1764705882
5-th percentile0.95
Q11.8
median2.75
Q34
95-th percentile6.5
Maximum16
Range15.82352941
Interquartile range (IQR)2.2

Descriptive statistics

Standard deviation1.833753643
Coefficient of variation (CV)0.5880652306
Kurtosis3.641104541
Mean3.118282714
Median Absolute Deviation (MAD)1.083333333
Skewness1.430692644
Sum9258.181378
Variance3.362652424
MonotonicityNot monotonic
2021-11-20T14:05:00.910067image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3204
 
6.9%
2200
 
6.7%
4163
 
5.5%
5142
 
4.8%
3.5136
 
4.6%
4.5113
 
3.8%
2.5111
 
3.7%
682
 
2.8%
171
 
2.4%
3.33333333371
 
2.4%
Other values (258)1676
56.4%
ValueCountFrequency (%)
0.17647058821
 
< 0.1%
0.22110552761
 
< 0.1%
0.27272727271
 
< 0.1%
0.27669902911
 
< 0.1%
0.27906976741
 
< 0.1%
0.28205128211
 
< 0.1%
0.33064516131
 
< 0.1%
0.33333333334
0.1%
0.34020618561
 
< 0.1%
0.36263736261
 
< 0.1%
ValueCountFrequency (%)
161
 
< 0.1%
143
 
0.1%
13.51
 
< 0.1%
121
 
< 0.1%
119
 
0.3%
107
 
0.2%
9.51
 
< 0.1%
916
0.5%
8.52
 
0.1%
834
1.1%

avg_unique_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct906
Distinct (%)30.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.48459137
Minimum0.2
Maximum259
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-11-20T14:05:01.102551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2
Q17.666666667
median13.6
Q322.14285714
95-th percentile46
Maximum259
Range258.8
Interquartile range (IQR)14.47619048

Descriptive statistics

Standard deviation15.46030748
Coefficient of variation (CV)0.8842246955
Kurtosis29.31744084
Mean17.48459137
Median Absolute Deviation (MAD)6.6
Skewness3.43586152
Sum51911.75179
Variance239.0211074
MonotonicityNot monotonic
2021-11-20T14:05:01.307666image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1342
 
1.4%
941
 
1.4%
839
 
1.3%
1639
 
1.3%
1438
 
1.3%
1738
 
1.3%
536
 
1.2%
1136
 
1.2%
736
 
1.2%
1535
 
1.2%
Other values (896)2589
87.2%
ValueCountFrequency (%)
0.21
 
< 0.1%
0.253
 
0.1%
0.33333333336
0.2%
0.41
 
< 0.1%
0.40909090911
 
< 0.1%
0.512
0.4%
0.54545454551
 
< 0.1%
0.55555555561
 
< 0.1%
0.57142857141
 
< 0.1%
0.61764705881
 
< 0.1%
ValueCountFrequency (%)
2591
< 0.1%
1771
< 0.1%
1481
< 0.1%
1271
< 0.1%
1051
< 0.1%
1041
< 0.1%
1011
< 0.1%
981
< 0.1%
95.51
< 0.1%
94.333333331
< 0.1%

Interactions

2021-11-20T14:04:51.821413image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:17.952482image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.935039image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.741256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.725501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:34.606620image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.500354image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.653222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.713683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.880528image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.075893image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.203673image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.480115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:51.989242image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.162679image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.074979image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.884731image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.908020image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:34.733253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.680814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.799054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.885460image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.038317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.243001image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.389395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.652436image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:52.173305image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.308786image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.210140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.006714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:24.072904image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:34.873478image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.818291image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.930393image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.159524image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.208244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.389406image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.574468image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.805582image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:52.353093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.445598image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.358591image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.144934image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:32.454978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.036653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.969861image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.077054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.299253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.382931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.552819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.770446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.984600image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:52.526139image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.604441image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.505916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.294672image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:32.758318image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.177085image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:37.121011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.223577image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.436892image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.537898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.710624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.926787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:50.152679image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:52.685122image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.749001image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.623507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.422306image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:32.903751image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.322644image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:37.255199image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.371240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.562378image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.691058image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:45.859030image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.091166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:50.294712image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:52.923577image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:18.892593image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.773605image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.573884image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:33.060513image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.480722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:37.433790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.527287image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.739758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:43.880558image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.025912image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.258534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:50.469656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:53.111990image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.045408image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:20.917991image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.732358image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:33.250319image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.619639image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:37.611280image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.712581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:41.879612image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.077043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.204314image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.426839image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:50.630537image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:53.285803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.186794image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.042022image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:22.860165image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:33.542608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.753875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:37.848809image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:39.912870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.031395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.230830image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.383659image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.591118image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:50.793636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:53.468948image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.347600image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.181483image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.009051image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:33.696915image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:35.906905image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.026743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.090774image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.197964image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.394608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.569799image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.777996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:51.014263image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:53.658554image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.491658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.324869image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.161440image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:33.853671image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.054536image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.178014image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.234433image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.359627image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.564321image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.714070image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:48.971243image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:51.316111image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:53.852138image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.646969image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.458622image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.377995image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:34.039384image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.196696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.348894image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.385158image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.543334image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.743824image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:46.870753image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.138838image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:51.492081image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:54.040939image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:19.791877image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:21.610080image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:23.566167image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:34.280448image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:36.348347image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:38.494915image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:40.568131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:42.726446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:44.918371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:47.042188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:49.311157image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-20T14:04:51.660654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-11-20T14:05:01.544559image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-20T14:05:01.792492image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-20T14:05:02.041351image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-20T14:05:02.323486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-20T14:04:54.403694image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-20T14:04:54.898490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketavg_recency_daysfrequencyqtde_retrunsavg_basket_sizeavg_unique_basket_size
00178505,391.2100372.000034.00001,733.0000297.000018.152235.500017.000040.00000.17650.6176
11130473,232.590056.00009.00001,390.0000171.000018.904027.25000.028335.00001.222211.6667
22125836,705.38002.000015.00005,028.0000232.000028.902523.18750.040350.00001.60007.6000
3313748948.250095.00005.0000439.000028.000033.866192.66670.01790.00001.60004.8000
4415100876.0000333.00003.000080.00003.0000292.00008.60000.073222.00000.66670.3333
55152914,623.300025.000014.00002,102.0000102.000045.326523.20000.040129.00001.21434.3571
66146885,630.87007.000021.00003,621.0000327.000017.219818.30000.0572399.00001.14297.0476
77178095,411.910016.000012.00002,057.000061.000088.719835.70000.033541.00001.91673.8333
881531160,767.90000.000091.000038,194.00002,379.000025.54354.14440.2433474.00000.47256.2308
99160982,005.630087.00007.0000613.000067.000029.934847.66670.02440.00002.14294.8571

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketavg_recency_daysfrequencyqtde_retrunsavg_basket_sizeavg_unique_basket_size
29595627177271,060.250015.00001.0000645.000066.000016.06446.00001.00006.000011.000066.0000
2960563717232421.52002.00002.0000203.000036.000011.708912.00000.15380.00005.000015.0000
2961563817468137.000010.00002.0000116.00005.000027.40004.00000.40000.00001.00002.5000
2962564913596697.04005.00002.0000406.0000166.00004.19907.00000.25000.00005.000066.5000
29635655148931,237.85009.00002.0000799.000073.000016.95682.00000.66670.00007.000036.0000
2964565912479473.200011.00001.0000382.000030.000015.77334.00001.000034.00008.000030.0000
2965568014126706.13007.00003.0000508.000015.000047.07533.00000.750050.00002.00004.6667
29665686135211,092.39001.00003.0000733.0000435.00002.51124.50000.30000.00003.0000104.0000
2967569615060301.84008.00004.0000262.0000120.00002.51531.00002.00000.00002.000020.0000
2968571512558269.96007.00001.0000196.000011.000024.54186.00001.0000196.00005.000011.0000